NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

How Many Raters Can Be Enough: G Theory Applied to Assessment and Measurement of L2 Speech Perception

https://doi.org/10.32038/ltrq.2023.37.12

Hirschi, Kevin; Kang, Okim (October 2023, Language Teaching Research Quarterly)

This paper extends the use of Generalizability Theory to the measurement of extemporaneous L2 speech through the lens of speech perception. Using six datasets of previous studies, it reports on G studies–a method of breaking down measurement variance–and D studies–a predictive study of the impact on reliability when modifying the number of raters, items, or other facets that assist the field in adopting measurement designs that include comprehensibility, accentedness, and intelligibility. When data from a single audio sample per learner were subjected to D-studies, we find that both semantic differential and rubric scales for comprehensibility were reliable at the .90 level with about 15 trained raters or 50 untrained crowdsourced raters. In order to offer generalizable and dependable evaluations, empirically informed recommendations are given, including considerations for the number of speech samples rated, or the granularity of the scales for various assessment and research purposes.
more » « less
Full Text Available
FLUENCY BENCHMARKS AND IMPACTS OF PRACTICE WITH INSTANTANEOUS ASSESSMENT ON INTERNATIONAL TEACHING ASSISTANTS’ SPEECH RATE AND PAUSE UNITS

https://doi.org/10.31274/psllt.15711

Hirschi, Kevin; Kang, Okim; Hansen, John; Looney, Stephen D (July 2023, Iowa State University Digital Press)
Pronunciation Assessment Criteria and Intelligibility

Kang, Okim; Hirschi, Kevin (March 2023, Speak out)

Various aspects of second language (L2) speakers’ pronunciation can be considered in the oral assessment of speaker proficiency. Over time, both segmentals and suprasegmentals have been examined for their roles in judgments of accented speech. Descriptors in the rating criteria often include speaker’s intelligibility (i.e., the actual understanding of the utterance) or comprehensibility (i.e., easy of understanding) (Derwing & Munro, 2005). This paper discusses the current issues and rating criteria in L2 pronunciation assessment, and describes the prominent characteristics of L2 intelligibility. It also offers recommendations to inform assessment practices and curriculum development in L2 classrooms in the context of Global Englishes.
more » « less
Full Text Available
Assessment of Non-Native Speech Intelligibility using Wav2vec2-based Mispronunciation Detection and Multi-level Goodness of Pronunciation Transformer

https://doi.org/10.21437/Interspeech.2023-2371

Shekar, Ram C.; Yang, Mu; Hirschi, Kevin; Looney, Stephen; Kang, Okim; Hansen, John H. (August 2023, ISCA INTERSPEECH-2023)
N/A (Ed.)
Automatic pronunciation assessment (APA) plays an important role in providing feedback for self-directed language learners in computer-assisted pronunciation training (CAPT). Several mispronunciation detection and diagnosis (MDD) systems have achieved promising performance based on end-to-end phoneme recognition. However, assessing the intelligibility of second language (L2) remains a challenging problem. One issue is the lack of large-scale labeled speech data from non-native speakers. Additionally, relying only on one aspect (e.g., accuracy) at a phonetic level may not provide a sufficient assessment of pronunciation quality and L2 intelligibility. It is possible to leverage segmental/phonetic-level features such as goodness of pronunciation (GOP), however, feature granularity may cause a discrepancy in prosodic-level (suprasegmental) pronunciation assessment. In this study, Wav2vec 2.0-based MDD and Goodness Of Pronunciation feature-based Transformer are employed to characterize L2 intelligibility. Here, an L2 speech dataset, with human-annotated prosodic (suprasegmental) labels, is used for multi-granular and multi-aspect pronunciation assessment and identification of factors important for intelligibility in L2 English speech. The study provides a transformative comparative assessment of automated pronunciation scores versus the relationship between suprasegmental features and listener perceptions, which taken collectively can help support the development of instantaneous assessment tools and solutions for L2 learners.
more » « less
Full Text Available
Data‐Driven Learning for Pronunciation: Perception and Production of Lexical Stress and Prominence in Academic English

https://doi.org/10.1002/tesq.3302

Hirschi, Kevin; Kang, Okim (January 2024, TESOL Quarterly)

Abstract Issues of intelligibility may arise amongst English learners when acquiring new words and phrases in North American academic settings, perhaps in part due to limited linguistic data available to the learner for understanding language use patterns. To this end, this paper examines the effects of Data‐Driven Learning for Pronunciation (DDLfP) on lexical stress and prominence in the US academic context. 65 L2 English learners in North American universities completed a diagnostic and pretest with listening and speaking items before completing four online lessons and a posttest on academic words and formulas (i.e., multi‐word sequences). Experimental group participants (n = 40) practiced using an audio corpus of highly proficient L2 speakers while comparison group participants (n = 25) were given teacher‐created pronunciation materials. Logistic regression results indicated that the group who used the corpus significantly increased their recognition of prominence in academic formulas. In the spoken tasks, both groups improved in their lexical stress pronunciation, but only the DDLfP learners improved their production of prominence in academic formulas. Learners reported that they valued DDLfP efforts for pronunciation learning across contexts and speakers. Findings have implications for teachers of L2 pronunciation and support the use of corpora for language teaching and learning.
more » « less
Characterization and normalization of second language speech intelligibility through lexical stress, speech rate, rhythm, and pauses

https://doi.org/10.1121/10.0016224

Kang, Okim; Hirschi, Kevin; Hansen, John H.; Looney, Stephen (October 2022, The Journal of the Acoustical Society of America)

While a range of measures based on speech production, language, and perception are possible (Manun et al., 2020) for the prediction and estimation of speech intelligibility, what constitutes second language (L2) intelligibility remains under-defined. Prosodic and temporal features (i.e., stress, speech rate, rhythm, and pause placement) have been shown to impact listener perception (Kang et al., 2020). Still, their relationship with highly intelligible speech is yet unclear. This study aimed to characterize L2 speech intelligibility. Acoustic analyses, including PRAAT and Python scripts, were conducted on 405 speech samples (30 s) from 102 L2 English speakers with a wide variety of backgrounds, proficiency levels, and intelligibility levels. The results indicate that highly intelligible speakers of English employ between 2 and 4 syllables per second and that higher or lower speeds are less intelligible. Silent pauses between 0.3 and 0.8 s were associated with the highest levels of intelligibility. Rhythm, measured by Δ syllable length of all content syllables, was marginally associated with intelligibility. Finally, lexical stress accuracy did not interfere substantially with intelligibility until less than 70% of the polysyllabic words were incorrect. These findings inform the fields of first and second language research as well as language education and pathology.
more » « less
Full Text Available
Mobile-Assisted Pronunciation Training With Adult Esol Learners: Background, Acceptance, Effort, and Accuracy

https://doi.org/10.31274/psllt.13272

Hirschi, Kevin; Kang, Okim; Hansen, John; Cucchiarini, Catia; Strik, Helmer (September 2022, PSLLT: Proceedings of 12th Pronunciation in Second Language Learning and Teaching Conference)

This study investigates the relationships of learner background variables of adult English for Speakers of Other Languages (ESOL) learners and a mobile App designed to promote pronunciation skills targeting features known to contribute to intelligibility. Recruited from free evening classes for English learners, 34 adult ESOL learners of mixed ESOL learning experiences, ages, lengths of residency, and first languages (L1s) completed six phoneme pair lessons on a mobile App along with a background questionnaire and technology acceptance survey (Venkatesh et al., 2012). A series of Linear Mixed-Effect Model (LMEM) analyses were performed on learner background variables, technology acceptance, learner effort, and accuracy. The results found a minimal relationship between age, technology acceptance, and effort (7.68%) but a moderate to large relationship between age, technology acceptance and accuracy of consonants (39.70%) and vowels (64.26%). The implications are that learner use of mobile devices for L2 pronunciation training is moderated by various learner-related factors and the findings offer supportive evidence for designing mobile-based applications for a wide variety of backgrounds.
more » « less
Full Text Available
Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment

https://doi.org/10.21437/Interspeech.2022-11039

Yang, Mu; Hirschi, Kevin; Looney, Stephen Daniel; Kang, Okim; Hansen, John H.L. (September 2022, Interspeech)

Full Text Available
Using lexical stress, speech rate, rhythm, and pauses to characterize and normalize second language speech intelligibility

https://doi.org/10.1121/2.0001790

Kang, Okim; Hirschi, Kevin; Hansen, John; Looney, Stephen; Miao, Yongzhi (January 2022, Acoustic Society of America)

Full Text Available
Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment

Mu, Yang; Hirschi, Kevin; Looney, Stephen; Kang, Okim; Hansen, John (January 2022, Interspeech)

Current leading mispronunciation detection and diagnosis (MDD) systems achieve promising performance via end-to-end phoneme recognition. One challenge of such end-to-end solutions is the scarcity of human-annotated phonemes on natural L2 speech. In this work, we leverage unlabeled L2 speech via a pseudo-labeling (PL) procedure and extend the fine-tuning approach based on pre-trained self-supervised learning (SSL) models. Specifically, we use Wav2vec 2.0 as our SSL model, and fine-tune it using original labeled L2 speech samples plus the created pseudo-labeled L2 speech samples. Our pseudo labels are dynamic and are produced by an ensemble of the online model on-the-fly, which ensures that our model is robust to pseudo label noise. We show that fine-tuning with pseudo labels achieves a 5.35% phoneme error rate reduction and 2.48% MDD F1 score improvement over a labeled-samples-only finetuning baseline. The proposed PL method is also shown to outperform conventional offline PL methods. Compared to the state-of-the-art MDD systems, our MDD solution produces a more accurate and consistent phonetic error diagnosis. In addition, we conduct an open test on a separate UTD-4Accents dataset, where our system recognition outputs show a strong correlation with human perception, based on accentedness and intelligibility.
more » « less
Full Text Available

« Prev Next »

Search for: All records